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ABSTRACT 

iW>^r«canil<lT  th*  problM  of  dynaalr  optlaal  investiga¬ 
tion  of  a  two-atate  (in  control,  out  of  control)  system.  The 
true  state  can  only  be  Inferred  from  reported  costs  and  the 
time  since  the  last  correction.  It  is  demonstrated  that  when 
the  parameters  satisfy  certain  conditions,  such  problesu  can  be 
efficiently  solved  ss  regenerative  stopping  problems.  Some 
general  results  for  regenarative  stopping  problems  are  also 
obtained.  In  the  last  section  the  problem  is  generalized  to  n 
two-atate  systems.  By  combining  sianilatlon  with  the  Regenera¬ 
tive  Stopping  Algorithm,  a  problem  with  20  state  variables  is 


solved  with  a  small  error  term. 


1.  INTRODUCTION 


is  often  feced  with  Che  problea  of  esMeeing  flnanciel  re¬ 
ports  CO  deteralne  if  corrective  ection  should  be  teken.  The  eituetion  is 
probebilisCic  in  that  diMppointing  perforaence  does  noc  neceseerilp  acen 
Chet  corrective  action  should  be  taken  since  Che  disappointing  perforaance 
nay  be  caused  by  soae  uncontro liable  and  nonrecurring  factor.  This  problea 
area  has  attracted  the  attention  of  Buckaen  [6].  Dictaen  end  Prekash  (9] 
Dyckaan  1 10),  Osan  and  Dyckaan  (IS),  end  Hegee  (13).  In  112]  Kaplan  has  a 
survey  of  this  area  which  includes  soae  industrial  engineering  aodels. 
Hacheaaticelly  these  probleas  are  siailer  to  soae  of  the  aaintenance  aodels, 
and  the  reader  is  directed  to  Pierskalla  and  Voclker  (16)  for  a  survey  of 
the  aaintenance  aodel  literature. 

The  approach  of  Kaplan  (11)  is  dynaaic  and  allows  the  decisions  each 
period  to  depend  on  our  current  esCiaaCe  of  the  probability  that  the  systea 
la  in  control.  Since  the  true  state  of  the  world  is  not  kno%m,  we  have  a 
difficult  optimisation  problea  «rhich  would  seem  to  permit  only  the  two-state 
(in  control,  out  of  control)  aodel  considered  by  Kaplan,  and  even  this  aodel 
can  be  very  desMnding  coaputat tonally .  Hsgee  (13)  considers  Kaplan's  two- 
state  model,  and  allows  costs  to  be  normally  distributed  so  that  the  true 
state  can  take  on  all  values  on  subsets  of  (0,1).  Because  of  the  difficul¬ 
ties  of  coaputing  an  optlaal  rule,  Magee  proposes  seven  plausible  rules 
which  he  compares  by  siaulstion.  The  non-dynaaic  foraulation  of  Osan  and 
Dyckaan  (IS)  allows  a  auch  acre  detailed  aodel  which  is  solved  by  linear 
proRraaalng  rather  than  dynaaic  progrsaaing.  For  example,  coat  variances 
arc  peraltted  to  have  a  number  of  possible  causes. 

We  consider  the  two-state  Kaplan  aodel  and  solve  It  as  a  regenerative 
stopping  problxa.  However  unlike  Kaplan's  value  Iteration  solution  procedure. 


I 


or  the  policy  iteration  procedures  Bcntlonad  below,  we  auet  require  that  the 
peraaeters  of  the  aodel  satisfy  certain  restrictions,  iegeneratlve  stopping 
probleas  were  foraulsted  as  such  independently  by  Brender  (4]  and  Brelaan  (3]. 
Algorlthas  for  auch  probleas  are  not  fully  developed,  and  this  paper  gives 
soae  new  results.  Besides  being  aore  efficient  than  Kaplan's  algoritha,  our 
procedure  la  evidently  aore  efficient  ttuui  the  policy  iteration  aethods  of 
Brown  (5)  and  Sondik  (19)  which  can  also  be  applied  to  Kaplan's  aodel. 

In  the  last  section  we  geoerallse  the  Kaplan  aodel  to  n  dlacnslons  by 
considering  n  Independent  processes  which  are  related  In  that  corrective  ac¬ 
tion  Is  taken  for  all  n  for  none  of  thea.  As  n  Increases  the  problea  rapidly 
becoaes  Intractable  for  all  procaduras  Including  ours.  However,  by  using 
slaulatlon  techniques,  the  n  state  variable  problea  can  be  solved  by  the  re¬ 
generative  stopping  algoritha.  This  Introduces  an  error  factor  but  20  state 
variable  probleas  are  solved,  and  policies  are  obtained  which  coae  within 
.00003  percent  of  the  true  alnlaua  cost. 

2.  THE  KAPLAN  HODEL 

Uc  consider  a  two-state  production  aystaa  where  state  1  SKsns  the  systea 
la  "In  control"  and  state  2  acans  the  systea  la  "not  In  control."  We  let 
be  the  randoa  variable  which  represents  the  reported  costs  In  period  1.  When 
the  systea  Is  in  state  k,  k  •  1,2,  the  probability  suss  function  of  reported 
costs  Is  Eresuaably  f^(y)  has  SMst  of  its  probability  at  low  costs 

and  fjCy)  has  aost  of  Its  probability  at  higher  costs.  We  let  a^  and  BI2  ^ 
the  aeans  of  the  two  distributions. 

When  in  state  1  there  la  a  probability  p  of  staying  In  state  1  and  proba¬ 
bility  (1-p)  of  going  to  state  2.  This  aove  to  state  2  la  assusMd  to  take 
place  late  enough  so  that  reported  costs  are  deteralned  by  the  state  at  the 
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beginning  of  the  period  and  therefore  ere  described  by  f^^.  The  cost  in  the 
given  period  end  the  state  the  systca  goes  to  in  the  next  period  ere  sssuaed 
to  be  conditionally  Independent  given  the  state  at  the  beginning  of  the 
period.  When  in  state  2  the  syataa  will  stay  there  if  no  corrective  action 
la  taken.  Therefore,  the  aystaa  can  be  represented  by  a  two-st/ite  Markov 
process  whose  one-step  transition  aatrix  is: 


P 


1-p 

1 


Investigation  and  correction  nay  be  taken  at  the  beginning  of  any  period  at  a 
cost  K  and  the  aysten  goes  to  state  1  instantaneously.  The  cost  K  is  incurred 
even  If  the  aysten  were  already  in  state  1  and  there  was  no  correction. 

The  state  of  the  aysten  Is  not  known  to  the  declsion-aakcr  except  after 
an  Investigation,  and  it  can  only  be  inferred  fron  the  operating  costs  and  the 
length  of  tine  since  corrective  action.  What  Is  actually  detemined  is  the 
probability  that  the  systen  Is  In  state  1.  This  is  carried  out  by  the  follow¬ 
ing  Bayesian  procedure.  Let  x^  be  the  probability  that  the  systen  is  in  state 
1  at  the  beginning  of  period  1  before  knowing  the  reported  costs.  IXiring 
period  1  the  reported  coats  are  -  y,  and  after  obtaining  the  reported  costs 
ve  make  a  revised  eatiaate  of  the  probability  that  the  systen  Is  in  state  1  at 
the  beginning  of  period  i,  and 

*1  • 


^Kaplan's  position  on  this  issue  is  unclear.  On  Page  33  he  says  that  the  nove 
is  early  enough  In  the  period  to  affect  operating  costa.  On  Pages  35  and  36 
costs  are  based  on  the  ending  state  iMediately  following  an  investigation, 
but  are  based  on  the  beginning  state  otherwise.  The  latter  is  consistent 
with  the  Bayesian  equation  on  Page  34,  which  holds  only  if  costs  are  based 
on  the  beginning  state. 
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As  stated  earlier  equation  (1)  aasuaaa  that  if  there  la  a  transition  fro* 
state  1  to  state  2  the  costs  are  dataxsdnad  by  tha  stats  at  tha  baqinnlng 
of  tha  parlodt  stata  1.  Tha  value  of  is  than 

*l-»-l  ■  P*!  ■  P*i  • 

which  Is  the  saae  as  the  aquation  of  Page  34  of  Kaplan  [11]. 

Since  the  true  state  cannot  be  obaerved  directly,  the  state  variable  of 
our  Markov  decision  problea  Is  x^.  Tvo  decisions  are  available,  decision  1 
which  Is  do  nothing,  and  decision  2  which  Is  to  Investigate  and  correct  If 
necessary.  The  one  period  expected  costs  are  given  by 

C(x^,l)  -  x^Bj^  +  (I-XjIbj  (3) 

C(Xj,2)  -  K  +  (4) 

Let  d^  be  the  decision  In  state  1.  Than  the  state  transition  function  Is  given 
by 

Xj^j  -  p)tj  If  dj  ■  1  and  •  y  “'‘1  ^5) 

■  p  If  •  2. 

The  probability  that  -  y  Is  fi(y)*£  ‘ 

The  objective  Is  to  deteralne  a  daclslon  rule  (a  specified  decision  for  each 
possible  state  and  period)  which  ainlalsaa  tha  expected  discounted  cost  over 
an  Infinite  planning  horlson.  The  discount  factor  a,  0  <  a  <  1,  will  represent 
the  time  value  of  aoney  to  the  flra. 

Kaplan  solved  his  problea  by  solving  for  Fj^,  the  optlaal  return  function 
with  one  period  to  go,  for  all  0  x  <,  1.0,  then  for  F^.  the  optlaal  return 
function  with  two  periods  to  go,  sad  than  for  Ty  etc.  The  convergence  of  the 
F^  la  Infinite  end  In  value  Iteration  tha  repetition  of  a  policy  (the  saae  de¬ 
cision  rule  for  F^  as  laply  that  the  policy  is  optlsMl.  Although 
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these  difficulties  are  not  dlecuseed  by  Kjtplen,  presuaebly  one  could  use  tne 


ideas  found  In  Porteus  (17)  to  detereilne  when  an  optlaal  policy  has  been  found 


It  Is  interesting  that  Kaplan's  problea  can  also  be  solved  by  policy 


iteration,  and  the  aethods  are  finite  for  Kaplan's  problem  when  the  nuaber  of 


realized  costs  is  finite.  Either  Brown's  method  of  recursive  sets  of  rules 


(5)  on  Sondlk's  "finitely  transient"  procedure  (19)  can  be  used,  and  Sondlk 


points  out  their  similarity.  The  coaputational  requireaents  of  Brown's  aethod 


when  applied  to  an  example  problem  is  discussed  in  Section  4 


3.  REGENERATIVE  STOPPING  PROBLEMS 


We  propose  to  solve  the  Kaplan  model  as  a  regenerative  stopping  probl' 


Since  regenerative  stopping  problems  are  optimal  stopping  problcas  which 


regenerate  or  recommence  upon  stopping,  we  begin  by  describing  optimal  stop 


The  general  optlsMl  stopping  problem  is  described  at  the  beginning  of 


Chapter  3  of  Chow,  Robbins,  snd  Siegaund  (7].  We  will  be  less  general  and 


consider  a  stopping  problem  which  is  a  Markov  decision  problem  with  the  set 


of  possible  states  X.  In  the  Kaplan  model  X  is  the  Interval  (0,1).  When  the 


one  decision  is  to  continue  and  receive  a  cost 


C(x,l)  and  go  to  the  next  state  according  to  the  probability  mass  functions 


where  T  is  the  countable  nuaber  of  new  states  trhich  can  be 


reached  from  x.  The  other  decision  is  to  stop  and  receive  a  cost  of  C(x,2) 


The  problem  ends  immediately  with  the  decision  stop.  Costs  are  discounted 
by  a  discount  factor  a.  Our  objective  is  to  deteraine  a  decision  rule  «rhich 


mintmizes  the  expected  discounted  cost  incurred  up  to  and  including  stop' 
ping  where  the  initial  state  ia  x°.  Wa  assuaa  that  the  costs  C(x,l)  and 
C(x,2)  are  bounded  in  absolute  value  as  x  variaa  over  X. 


1 


I 

I 


J 

I 


In  order  for  the  optlael  stopping  problea  to  have  an  easily  coaputable 

solution  it  is  necessary  that  it  satisfy  the  Monotone  Condition. 

Monotone  Condition.  Let  B  •  {x:C(x,2)  <  C(x,l)  ♦  a  C(e,2)p^^}.  Then 

ecT 

X 

p_  -  1  If  X  C  B. 


srT  B 


xz 


The  set  B  represents  precisely  those  states  where  stopping  Is  at  least  as 
good  as  continuing  exactly  one  sore  period  and  then  stopping.  In  probleas 
with  enough  structure  that  the  nonotone  condition  holds,  the  set  B  Is  often 
very  easy  to  determine.  The  monotone  condition  says  that  when  the  system 
reaches  the  set  B  It  stays  In  B. 

Clearly  If  x  #  B,  then  the  optimal  decision  Is  to  continue.  It  also 
would  seem  to  be  true  that  If  x  C  B  and  the  monotone  condition  holds  then  the 
optimal  decision  la  to  stop.  This  needs  to  be  proven  and  Is  known  as  the 
Monotone  Stopping  Theorem  (Chow,  Robbins,  and  Slegmund  {7,  Theorem  3.3J,  Ross 
(18, Theorem  6.14))).  In  addition  to  the  monotone  condition,  some  additional 
technical  conditions  are  needed  to  prove  the  theorem,  which  very  with  the 
particular  form  of  the  optimal  stopping  problem  considered.  Ross's  proof  (18), 
Theorem  6.14)  applies  to  our  formulation  once  his  stability  condition  is  veri¬ 
fied.  The  stability  condition  Is  that  C^,  the  optimal  return  fur.ctlon  when 
there  are  n  periods  to  go,  converges  to  C,  the  optimal  return  function  for  the 
Infinite  horlson  case.  We  have  assumed  that  C(x,l)  and  C(x,2)  are  bounded  so 
that  It  Is  well-known  (Denardo  [8,  Theorem  4))  that  C,  and  therefore  the 

Monotone  Stopping  Theorem  does  apply  to  our  formulation  with  bounded  costs. 
Theorem  1.  (The  Monotone  Stopping  Theorem)  Suppose  that  the  monotone  con¬ 
dition  holds.  Then  the  optimal  decision  rule  is  to  continue  if  x  <  B  and  to 
stop  If  X  c  B. 

The  Monotone  Stopping  Theorem  Is  Important  computationally  since  the 


optimal  policy  la  known  onc«  B  Is  dstsninsd,  and  Chart  la  no  need  for  cha 
recursive  dynaaiic  prograamlng  calculations.  It  la  an  axaa^>la  of  a  ayoplc 
policy  being  optisMl,  and  cha  slapllf IcaClon  darivad  from  ayoplc  policies 
la  wall-known. 

Regenaraclva  stopping  problaas  differ  froa  optlaal  stopping  probleaa  In 
that  whan  wa  atop  and  receive  a  coat  of  C(*,2),  the  systaa  aovea  instantaneoualy 
to  the  initial  state  x°,  and  the  problea  continues.  Breloan  [3]  called  such 
processed ,  "Binary  Decision  Renewal  Problaas."  Our  criterion  will  be  to  alnl- 
alze  the  expected  discounted  coot  over  an  Infinite  horizon.  Solving  such  prob¬ 
lems  and  in  particular  the  Kaplan  model  Is  Che  topic  of  the  next  section. 

4.  A  COMPUTATIONAL  PROCEDURE 


Regenerative  stopping  problaas  were  first  foraulated  independently  by 
Brender  {4)  and  Brelaan  (3],  and  both  showed  that  regenerative  stopping  prob¬ 
leaa  could  be  solved  by  solving  the  right  stopping  problea.  They  considered 
the  nondlscountcd  case  with  the  average  cost  per  period  criterion.  For  the 
discount  case,  this  basic  theorea  Is  stated  In  Bell  [1,  Theorem  1]  who  attri¬ 
butes  it  to  Taylor  (20).  However,  Theorea  4  of  [20]  is  for  the  average  cost 
criterion,  and  tharefore  a  proof  for  tha  discount  case  will  be  supplied.  This 
result  suggests  the  strategy  of  solving  regenerative  stopping  probleaa  by  solv¬ 
ing  a  sequence  of  stopping  problaas  ending  with  the  right  stopping  problea,  but 
neither  Brender  nor  Brelaan  consider ad  this  idea. 

To  state  the  theorea  we  need  cha  idea  of  A-stopplng  problaa.  A  A-stopping 
problea  Is  a  stopping  problaa  where  tha  cost  of  continuing  Is  changed  froa 
C(x,l)  to  C(x,l)  -  A  and  the  cost  of  atopplng  C(x,2)  is  left  unchanged.  In 
the  discount  case  Theorem  2  gives  us  Che  interpretation  of  A*,  the  "right"  A, 
as  (1-a)  P^(x*’)  whsre  F^(*)  is  the  optlaal  ratum  function  of  the  regenerative 
stopping  problea  with  a  discount  factor  of  a.  In  the  average  cost  per  period 
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case,  X  stand  for  the  average  cost  par  period.  In  the  finite  state  model,  it 
is  known  (Blackwell  (2, Theorem  4])  that  (l*Ki)  F^(*)  converges  to  the  vector  of 
average  costs  per  period  as  a  1. 

Let  C(x,X)  be  the  optimal  return  function  of  the  X-  stopping  problem  from 

Q 

the  initial  state  x.  The  initial  state  x  is  Important  enough  that  %>e  introduce 
the  function  V  defined  by  V(X)  ■  CCx*’,!). 

Theorem  2.  If  X*  satiafies  V(X*)  -  0,  then  the  optimal  decision  rule  for  the 
X*-stopplng  problem  is  the  optimal  decision  rule  for  the  regenerative  stopping 
problem.  Also  X*  ■  (1-a)  F^(x°). 

Proof .  It  is  convenient  to  drop  the  superscript  a  and  let  F  ■  F®  since  a  is 

fixed  in  what  follows.  The  equations  of  optimality  are 

C(x,X*)  -  min  (C(x,l)  -  X*  +  a  C(s,X*)p  ,  C(x,2))  and  (6) 

ICT 

X 

F(x)  -  min  (C(x,l)  +  a  ^  F(*)P  .  C(x,2)  +  F(x°)).  (7) 

teT  ** 

X 

There  is  no  discount  factor  before  F(x°)  in  equation  (7)  since  the  move  to  x° 

is  instantaneous  with  the  decision  stop.  We  want  to  show  that  F(x)  -  C(x,X*) 

X*/(l-a)  satisfies  (7).  From  (6)  we  have  that 

F(x)  -  X*/(l-o)  •  min  (C(x,l)  -  X*  -  aX*/(l-a)  +  a  ^ 

zeT 

C(x,2)).  * 

By  cancelling  and  adding  X*/(l-a)  to  all  terms  we  have 

F(x)  -  min  (C(x,l)  ♦  a  X*/(l-a)). 

set. 

By  hypothesis  C(x°,X*)  •  0  so  that  f(x®)  •  X*/(l-a)  and  F  satisfies  (7).  By 
the  uniqueness  of  the  solution  of  tha  equation  of  optimality  in  the  discount 
case  F  •  F  and  X*  •  (l-o)  F(x°).  Fiirthermore  tha  two  equations  below  (7) 
establish  that  the  same  decisions  which  optimise  (6)  for  each  state  x  optimise 
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(7),  which  conpletes  the  proof. 

Thus  the  regenerative  stopping  problea  can  be  aolved  by  solving  the  slapler 
stopping  problems  If  the  right  1  la  uaad.  Furthermore,  If  the  monotone  condi¬ 
tion  holds  for  the  X-stopplng  problem,  then  the  Monotone  Stopping  Theorem  can 
be  applied.  Thus,  our  computational  approach  la  to  solve  a  sequence  of  V(l) 
problems  until  a  X*  aatlefylng  V(X*)  -  0  Is  found. 

We  begin  by  observing  that  V^(X),  the  expected  cost  of  the  A-stopplng  prob¬ 
lem  using  a  policy  rr.  Is 

V^(A)  -  EIC.^J  -  AEd-o’^J/d-o)  (8) 

where  T  Is  the  random  period  where  we  choose  to  stop  and  Is  the  original 

(without  subtracting  A)  discounted  cost  up  and  Including  the  cost  of  stopping 

using  the  policy  it.  Our  convention  is  that  the  periods  are  numbered  starting 

from  zero,  so  that  if  T  Is  the  random  period  we  stop  then  we  have  gone  T  periods 

T  T-1 

until  stopping.  The  Interpretation  of  E(l-o  J/d-a)  *1+0+..+  a  »T>1 
-  0  for  T  ■  0,  Is  the  expected  discounted  number  of  periods  until  stopping. 

The  following  proposition  is  needed  for  the  regenerative  stopping  algorithm. 
The  proof  Is  not  given  since  It  Is  similar  to  that  for  the  analogous  result  In 
the  average  cost  case  which  Is  proven  In  [1^]. 

Proposition  1.  V  Is  a  decreasing,  finite-valued,  and  concave  function  of  A. 

Since  V  is  concave  It  Is  known  that  the  right  and  left  hand  derivative  exists 
everywhere  for  A  >  0.  FurthersK>re 

V^(A)  >  -  Ed-Q^'l/d-a)  >  v;(A) 

where  and  are  the  right  and  left  hand  derivatives  of  V,  and  T  Is  the 
time  we  stop  using  a  A-optimal  policy. 

In  (14]  two  other  propositions  ware  establlehed  which  proved  respectively 
an  alternative  optlMllty  condition  and  showed  that  the  solutions  Improve  as 


successive  X-stopplng  problens  are  solved.  Instead  of  using  those  results  we 
will  establish  a  better  result  which  gives  a  error  bound  for  a  non-optlsul 
policy.  The  proof  Is  given  for  the  discount  case  and  a  siallar  result  can  be 
obtained  for  the  average  cost  case.  As  a  prellalnary  to  Proposition  2,  we  ob¬ 
serve  that  P"J(x°),  the  expected  discounted  cost  of  the  regenerative  stopping 
problem  using  policy  s  and  starting  from  state  x°,  satisfies 

f‘^(x°)  -  ElC.l  +  EIq'^I  f“(x°)  (9) 

T!  r  n 

where  T  and  are  aa  defined  In  (8).  From  (8)  E(C^}  •  ^  AE(l-a  j/fl-a) 

and  wc  substitute  this  equation  Into  (9)  and  rearrange  to  obtain 

(1-a)  f“(x°)  -  X  +  V,(A)/(Ell-a’^l/(l-Q)).  (10) 

Proposlt  ton  2.  l.et  X^  <  X^  be  such  that  V(Xq)  >  0  and  V(Xj)  <  0.  Let  tij 
and  V*  be  the  X^-optlmal,  the  Xj-optimal  and  the  X*-optlmal  policy  of  Theorew  2 

^  «  «  VfX  )  1T(X,)-T(XJ] 

respectively.  Then  (I-'i)  (F  (x  )  -  F''*  (x  ))<  -  . .  .■  .  . -  • 

’’q  ~  T(Aj)  TlAp) 

^  T(Xj)  T(Xq) 

where  T(Xp)  •  Ej 1 J / ( 1 -a)  Is  the  expected  discounted  time  until  stopping  using 
the  policy  (T  depends  on  .  An  analogous  definition  applies  to  T(Xj). 

Proof;  By  Proposition  1 ,  V  is  concave  and  -  T(Xj)  la  a  supporting  hypcrplane 

at  Xj.  Therefore  for  X^  ■<  X  £  X^,V(X)  >  V(Xq)  +  (X-X^)  (-T(X^)), 

since  the  one-sided  derivatives  of  V  are  greater  thar.  or  equal  to  -  T(X^) 

^0  -  ^  *>«**'®*n  ^0  ^1*  ®  ♦  (X*-X^)(-T(Xi)). 

Dividing  by  -T(X^)  yields  0  <  V(Xq)/(.T(X^))  ♦  (X*-!^)  -  V(Xq)/T(Xq)  +  V(Xq)/T(Xjj) , 
We  now  apply  (10)  where  ’t  •  «n<l  X  «  X^,  and  X*  ■  (1-a)  F®^(x°)  to  obtain. 
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V(V 


V(A  ) 

which  establishes  the  first  Insqusllty.  Tbs  second  is  astsbllshsd  in  s  slallar 
manner  start ln(  from  the  equation 

V(X)  >  V(X^)  ♦  (X-X^)(-T(Xq))  for  Xq  <  X  <  X^.  Q.E.D. 

As  s  corollary  to  Proposition  2  vs  see  that  if  TfX^)  •  T(X^)  then  both  the 
XQ-optlaal  and  X^-optlasl  policies  are  optimal.  In  the  discounted  case.  Propo¬ 
sition  I  says  that  V  is  finite  and  decreasing  so  that  there  will  always  exist 
a  X*  such  that  V(X*)  ■  0.  It  could  be  the  case  that  tha  axpected  tlam  until 
stopping  for  the  optiasl  policy  is  infinite. 

The  legsnerstive  Stoppinn  Alaorithn 

Step  O.A.  Find  a  X^  which  is  lass  than  X*,  where  by  Theorem  2  X* 

-  (1-a)  F°‘(x°).  It  is  desirable  that  X^  be  as  large  as  possible.  We  solve 
the  X^-stopplng  problea  and  let  ir  bs  the  optimal  policy  for  that  problem. 

Since  Xq  ^  X*,  V(Xq)  ^0.  If  a  mistake  is  mads  and  X^  is  greater  than  X*, 
then  V(Xq)  <  0  and  X^  can  be  changed  until  V(Xq)  ^  0. 

Step  O.B.  Set  Xj  -  X^  V(XQ)/(I[l-a^l/(l-a)  -  (l-o)  f“(x°)  >  X*  where  t 
is  the  optimal  policy  of  the  X^-stopping  problem.  We  solve  the  X^-atopping 
problem.  Since  X^  >  X*,  V(Xj)  <  0. 

Step  1.  We  are  now  in  tha  general  ease  where  we  have  solved  a  X^-atopping 
problea  and  a  X^-atopping  problem  where  Xq  <  X*  and  X^  >  X*.  The  new  X-stopping 
problem  to  be  solved  is  given  by  X****  -  aln  (X,(l-o)  F^(x®))  where  «  is  the  best 

policy  determined  to  data,  and  X  •  oX^  ♦  (I  -  a)  X^  where  0  <  o  <  1. 

The  subscript  B  stands  for  blaectlon  and  tha  aubscript  A  stands  for  approxi¬ 
mation.  Conpucational  axpariencss  suggest  choosing  a  low  valus  of  a, 

since  the  approxiauition  is  quite  accurate.  Wa  have  X^  ■  1/2  X^  1/2  X^^,  and 


la  th«  A  tfuch  that  V^(A)  •  0,  trhara  V^(A)  is  baaad  on  tha  four  aquatlona: 


VV-''<V  • 

Va(^i)  -  V(Aj)  .  and  VJ^(A^)  -  -  T(A^) 


(11) 


whara  T(Aq)  and  T(A^)  ara  daflnad  In  Propoaltlon  2.  Thaaa  aquatlona  detemlne 
tha  coafflclanta  of  the  cubic  approxlaatloo  V^CA)  •  Bq  ♦  B^A  +  B^A^  ♦  B^A^.  The 
oondlClona  on  tha  darlvatlvaa  ara  baaad  on  Propoaltlon  1.  •ntm  A“*'*-atopplng  prob- 
lea  la  aolved  and  a"*^  replacea  A^  If  V(A"*^)  <  0  and  replacea  A^  If  VCA"***)  >  0. 
We  check  Propoaltlon  2  to  aaa  If  the  error  tara  la  aufflclantly  aaall.  If  not 
return  to  Step  1. 

Coaaant .  A  valua  of  a  >  0  In  Step  1  aaauraa  Chat  tha  "Intarval  of  uocar- 
canity"  goca  to  saro.  For  further  dlacuaalon  of  tha  Regenerative  Stopping 
Algorithm  Che  reader  la  referred  to  (14] . 


We  will  apply  our  computational  procacura  to  the  Inveatlgat Ion  model  of 


Kaplan.  In  thla  numerical  example.  Chare  ara  only  two  coat  outcomea,  xero  and 
alx.  If  Che  atate  la  1  chan  alxty  parcant  of  the  time  the  coat  la  0  and  forty 
percent  it  la  6.  If  the  atata  la  2,  tha  coata  0  and  6  are  equally  likely. 

The  dlacount  factor  a  la  .98  and  p,  tha  tranaltlon  probability  from  atate  1 
to  atata  1,  la  .9.  Thla  numerical  example  haa  tha  aama  number  of  coat  outcomes 
aa  Kaplan's.  Tha  parameter  valuaa  ara  changed  since  tha  monotone  condition  did 
not  hold  for  tha  A-stopplng  problem  with  his  paramatars. 

For  our  example  tha  coat  of  continuing,  C(x,l),  la  given  by  (3)  and 
la  xm^  e  (l-«)n2  •  2.4x  e-  (l-x)3  •  3-.6x.  Tha  coat  of  atopplng,  C(x,2)  la 
K  •  I.  In  the  A-atopplng  formulation  wa  do  not  add  tha  coat  m^  to  K  alnca  tha 
convention  la  to  aeaume  that  atopplng  la  Inatantanaoua.  Tha  tranaltlon  proba¬ 
bilities  ara  given  by  aquation  (S). 
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W«  n«xt  consider  the  aonotone  condition  end  deteralne  the  eet  of  A-etop- 
plng  problems  such  chat  the  aonotone  condition  holds.  Given  X,  the  set  B  Is 
{x:10-.6x-A‘f.98}  so  that  B  Is  alanys  of  the  fora  {x:s£c).  Therefore  we  need 
to  determine  thoee  states  x  such  chat  n(x),  the  next  state  given  an  observed 
cost  of  0,  is  larger  than  x.  These  are  the  states  chat  potentially  can  cause 
us  to  leave  the  sec  B  once  it  is  entered  and  thus  violate  the  aonotone  condi¬ 
tion.  The  situation  where  the  observed  cost  is  6  does  not  need  to  be  con¬ 
sidered  since  the  next  state  always  has  a  lower  value  when  the  observed  cost  is  6 

chan  when  it  Is  0.  From  (S)  we  have  that  n(x)  *  .54x/(.S4-.lx),  and  therefore 

n(x)  -  X  is  decreasing  in  x.  The  x  such  that  n(x)  •  x  is  .4  so  that  for 
X  ^  .4  n(x)  ^  X.  Our  claia  is  chat  the  aonotone  condition  will  hold  when  B 
is  of  the  fora  {x:x£c)  and  c  ^  The  proof  is  that  if  x  c  B  then  n(x)  <  n(c) 

since  n  is  increasing,  and  n(c)  £  c  c  B  since  c  ^  .4.  In  terms  of  X  the  require- 

aent  is  that  1^3-  .24  -  X  -f  .98  and  therefore  that  X  £  2.74. 

We  begin  step  O.A.  of  the  algorltha  by  setting  X^  •  2.5.  This  value  of  X^ 
was  siaply  a  reasonable  value  chat  we  hope  will  be  less  chan  X*.  The  aonotone 
condition  holds  so  that  Theorea  1  says  that  Che  XQ-opClaal  policy  is  to  stop 
for  X  <  Of. 98-2. 5-1.0)/. 6  -  .8. 

The  next  step  is  to  calculate  V(2.S).  This  is  done  by  calculating  V^(2.5), 
the  expected  cost  up  to  and  including  period  1  of  the  X-stopplng  problem,  succee- 
sively  for  1  ■  0,1,2 . 

In  the  sero  period  we  know  that  x  ■  1  with  probability  1  and  C(l,l)  -  X 
■  3-. 6-2. 5  *  -  .1.  Therefore,  V°(2.S)  ■  -  .1.  In  the  first  period  we  know 
that  X  ■  .9  with  probability  1  and  C(.9,l)  -  X  ■  -  .04.  When  we  Include  the 
discount  factor  of  .98  we  have  that  V^(2.3)  •  -  ,1392. 

In  Che  first  period  we  observe  a  cost  of  sero  with  probability  ,S9  ■  ((.9) 
(.6)  -f  (.1)(.3)]  and  a  cost  of  six  with  probability  .41.  Therefore  in  the 
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second  period  the  x  ■  .82373  with  probability  .59  end  x  <  .8  with  probability 
.41. 

Consequently  V^(2.5)  -  -  .1392  +  (.59(. 00077)  ♦  .41(1))  (.98)^  -  .25783. 

In  period  three  x  <  .8  with  probability  one  and  V  ■  V^(2.S)  .25783  -f  (.59) 

(.98)^  ■  .81313.  The  expected  discounted  time  until  stopping  In  1  -f  .98  4-  .56663 

-  2.54663.  Since  V(2.5)  >0,  <  X*  and  we  can  proceed  to  Seep  O.B. 

The  formula  for  X^  in  Step  O.B.  has  a  value  greater  than  2.74,  the  largest 
value  for  which  the  eonotone  condition  holds.  Therefore  we  will  Instead  set 
X^  •  2.734,  very  close  to  the  upper  bound,  and  hope  that  V(2.734)  £  0.  The  value 
of  V(2.734)  la  found  to  be  -  .44225  and  the  expected  discounted  time  until  stop¬ 
ping  Is  8.57073.  We  proceed  to  Step  1. 

A  cubic  fit  Is  made  of  on  the  Interval  [2.5,2.734]  using  (11)  and 

•  0  for  X  •  2.67667.  Therefore  X  •  .1  X  +  .9  X  •  2.67070.  We  solve  the 

B  A 

2. 67070-stopping  problem  and  V(2. 67070)  •  .04109  and  the  expected  discounted 
time  until  stopping  Is  6.67588.  A  cubic  fit  Is  made  In  the  Interval  [2.67070, 

2.734)  and  V^(x)  •  0  for  x  ■  2.67677,  and  X  •  2.67933.  The  value  of  (l-a)F°(x°) 

-  2.67070  .04109/6.67588  so  that  X“**  -  2.67686.  We  solve  the  2.67686- 

stopplng  problem  and  V( 2.67686)  -  .00046  and  the  expected  discounted  time 

until  stopping  la  6.84113.  At  this  point  wc  apply  Proposition  2  with  X^ 

-  2.67070  and  X^  -  2.67686.  The  value  of  (x  )  -  F^a(x  )  s  (6.67588) (6.84113) 

«  ^ 

-  .00008.  Since  f“  (x°)  -  —  (2.67686  -  .00006)  -  133.84  the  cost  of  policy  n 
Is  within  .00005  per  cent  of  Che  true  minimum,  F^a(x^).  Without  Proposition  2 
wc  would  simply  compare  the  lowest  coot  policy  with  the  known  lower  bound.  The 
cost  of  the  2. 67686-stopping  problem  Is  by  (10)  2.67686  -(.00046/6.84113) 

-  2.67679  and  Che  lower  bound  for  X*  Is  2.6707.  The  psreent  error  Is  .228. 

The  policy  Is  to  stop  If  and  only  If  the  probability  of  being  In  state  1 
Is  below  .50523.  This  policy  could  be  obtained  by  Brown's  procedure  of  recursive 
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Applying  rule  8  our  action  is  to  contlnua  and  if  wa  obaarva  a  coat  of  6  than 
our  next  rule  la  19.  Rule  19  aaya  to  continue  and  If  the  next  obaarvad  coat 
la  0  then  we  use  rule  20.  Rule  20  aaya  to  continue  and  If  the  next  obaervad 
cost  la  6  then  we  use  rule  25.  Rule  25  says  to  stop  and  than  use  rule  1,  etc. 
These  rules  have  the  property  that  whan  rule  25  la  reached,  then  the  state  of 
the  systea  la  less  than  .50523. 

The  conputatlons  Involved  In  adding  a  new  rule  in  Brown's  algorlthn  ate 
coaiparable  to  the  work  of  perforating  one  iteration  of  the  Regenerative  Stop¬ 
ping  Algorithm  as  the  main  effort  for  both  Is  ascertaining  the  current  optlaal 
return  function  or  Its  surrogate.  As  the  nuaber  of  rules  generated  to 
solve  the  example  problem  must  have  been  at  least  25,  the  Regenerative  Stopping 
Algorithm  does  quite  well  In  comparison. 

5.  THE  CENtOtALlZED  KAPLAN  MODEL 

We  consider  an  n-diaenslonal  version  of  Kaplan's  model  where  systea  j, 

1  jc  j  £  n,  la  a  two-state  production  system  as  before,  and  equations  (1),  (2), 
and  (5)  apply  to  each  systea  J.  The  probability  laws  of  the  n  systems  are 
assiiaed  to  be  Independent.  The  decisions  stop  and  continue  apply  to  all  n 
systems.  With  the  decision  continue  the  total  cost  Is  the  sum  of  n  different 
continue  costs  as  described  by  (3).  With  the  decision  stop  the  total  cost  Is 
K  ♦  ■, (J)  where  m,(j)  Is  the  expected  cost  when  systea  J  is  In  state  1  with 

J-1 

probability  one.  This  problem  Is  n-dlmcnslonal  and  the  state  of  the  systea  Is 

(x(l) .  x(n))  where  x(J)  Is  the  probability  that  system  J  Is  in  state  one. 

Value  Iteration,  policy  Iteration,  and  the  Regenerative  Stopping  Algorithm  can 
be  applied  to  the  generalised  modal,  and  each  will  aoon  fall  to  the  curae  of 
dlaenalonality.  In  the  cane  of  the  KaganaraCiva  Stopping  Algorithm  the  diffi¬ 
culty  liea  In  calculating  V(X).  For  example  whan  n  reaches  10  and  each  x(J), 

1  ^  J  ^  n,  may  take  on  up  Co  30  valuaai  eha  problem  ia  computationally  intrac¬ 
table.  However,  the  Begenarative  Stepping  A Igprithm  has  the  advantage  that 
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V(X)  for  a  given  X  la  a  ouabar  ao  that  it  ia  faaaibla  to  aatiaatc  V(X)  by 


aimulation  and  this  %rlll  bo  carrlad  out.  Siaulatloo  cannot  bo  applied  for 
value  iteration  or  policy  iteration  since  each  Iteration  requires  that  an  op- 
tiaal  return  function  be  detemlnad. 

Before  considering  a  higher  diaansion  problen  we  apply  siaulation  to  the 
exaaq>le  problen  of  Section  4.  The  lagenarative  Stopping  Algorithn  applies 
except  that  in  Step  1,  x"*''  >  X  since  we  would  not  know  (l-o)  F^(x°)  with  cer¬ 
tainty.  The  naln  difference  cosmo  after  the  iterations  have  been  conpleted 
and  we  evaluate  the  results  using  Proposition  2. 

For  each  iteration  tre  generate  a  sanple  sise  of  1000.  Variations  of  the 
variance  reduction  nethods  described  in  Wagner  [21]  were  used,  and  they  resul¬ 
ted  in  a  reduction  of  variance  in  the  order  of  S  to  30  tines  as  conpared  to  the 
conpletely  Independent  case. 


The  results  of  the  alnulation  are: 


Iteration 

X 

V(X) 

Standard  Deviation 
of  Error  of  V(X) 

T(X) 

Standard  Deviation 
of  Error  of  T(X) 

1 

2.5 

.81313 

.00007 

2.56663 

.00278 

2 

2.736 

-.63986 

.00172 

8.55863 

.02632 

3 

2.67091 

.03893 

.00111 

6.71135 

.02136 

6 

2.67923 

-.01618 

.00116 

6.90592 

.02197 

In  the  algorithn  the  values  of  the  standard  deviations  are  not  used.  However, 
they  are  used  when  we  apply  Proposition  2  which  we  now  describe.  We  let  X^ 

-  2.67091  and  X^  -  2.67923.  For  Xj^  we  set  V(X^)  -  -  .01618  -  S(. 00116)  -  -  .022. 
For  T(X^)  and  T(Xq)  we  use  6.90592  -  5(. 02197)  and  6.71135  -  5(. 02136)  respec¬ 
tively.  Thus  in  the  calculations  of  V(X^),  T(Xj),  T(Xq)  we  are  conservative 
and  arc  using  a  value  of  five  standard  errors  fron  the  eetinated  nean.  In 
order  to  calculate  (T(X^)-T(Xq) ]  ee  irill  naka  an  additional  sinulation  of  site 
100  to  calculate  this  quantity  directly.  Tha  result  hare  is  that  T(X^)-T(Xq) 
has  a  nean  of  .26799  and  a  standard  error  of  .02391.  Thus  with  very  high 
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probability 


1  (.022) (.38794) 

.02  ^6.60465) (6.79607)  ’ 


-  .00950 


Since  f\(x°)  ~  2.679  x  50  the  percentage  of  error  Is,  with  very  high  probability, 
less  than  or  equal  to  .007  percent  of  the  optlana.  In  this  case,  we  can  calcu¬ 
late  the  true  percentage  error  for  Che  policy  which  Is  optlaal  for  the  2.67923- 
stopplng  probleB  using  Che  non-sleuletlon  elgorltha.  The  true  percentage  of 
error  was  .0001. 

The  n-dlaenslonal  exaaple  chat  we  will  consider  has  n  ~  20  and  the  proba¬ 
bility  law  of  each  Independent  syscea  Is  the  saae  as  Che  exaaple  problea  we 
have  been  considering.  For  systea  j  the  costs  0  and  6  are  changed  to  0  and  6J 
%rlth  the  saae  probabilities  as  before  so  that  syataa  1  la  precisely  the  exaaple 
problea.  We  sec  the  cost  of  stopping,  K,  equal  to  160. 

As  before,  we  ausc  consider  the  aoootone  condition  and  deteralne  the  set  of 

X-stopplng  probleas  such  chat  Che  aonotona  condition  holds.  Given  X  the  set 

20 

B  la  (x(l) . x(20):  K  s  [  j(3  -  .6  x(j)\  -  X  ♦  .98K)  so  that  B  Is  of  the 

J-1 

20 

fora  {x(l) . X(20):  J  jx(j)  i  c>.  Therefore  we  need  to  deteralne  those 

J-1 

vectors  x  such  that  n(x),  the  next  vector  given  that  each  systea  observed  a 
n  n 

coat  of  0,  has  [  Jn(x(J))  i  I  Jx(J).  For  each  systaa  j,  n(x(j))  -  .54  x(J)/ 

J-1  J-1 

(.5  .1  x(J)).  Our  clala  la  that  Che  aonotonc  condition  will  hold  if  c  Is 

20 

z  T  .4J  >  84,  and  the  proof  Is  as  follows. 

J-1  20 

Let  xcB  so  that  I  Jx(J)  s  c.  Mow  let  ■  be  a  nuaber  such  chat 

J-1 

20  20 

I  J*  -  I  Jx(J)*  dlaaasloB  cane  In  Section  4  we  know  chat 

J-1  J-1 

n(x)  i  n(c/210)  since  n  Is  Increasing  and  n(c/210)  i  c/210  when  c  k  84,  so  that 


20  20 

210  n(z)  i  c.  Th«  proof  is  coaplsts  whsn  «•  show  chat  I  jn(s)  2  I  jn(x(J)). 

J-l  J-I 

20  20  20 

We  have  I  Jn(«)  •  I  Jn(x(J))  +  J  j(n(s))  -  n(x(J)).  By  tskins  dsrlvs- 
J-1  J-l  J-l 

elves  one  can  show  Chat  n  Is  concavs,  and  ths  concavity  of  n  laplles  that 


20 

20 

20 

1  J(n(t)) 

-  n(x(J))  i 

[  Jn'(t)[z  -  X 

(J)]  *  0  since 

1  ilt-x 

(J>1  -  0  by 

J-l 

J-l 

J-l 

construction 

,  trhich  completes  the  proof. 

In  teras  of  X 

the  requir 

eaent  is  that 

K  s  630  -  50 

.4  -  X  +  .98K  and  X  579.6 

-  .02K  -  576.4. 

The  results  of  the 

siaulation  are: 

Standard 

Standard 

Iteration 

X 

V^X) 

Error  of  V(X) 

mi 

Error  of  T(X) 

1 

525. 

124.07 

.027 

2.6484 

.018 

2 

575. 

-132.674 

.1588 

8.2694 

.0189 

3 

555.219 

3.3418 

.0638 

5.6715 

.01148 

4 

556.735 

-5.2245 

.0580 

5.7820 

.01318 

5 

555.829 

-.0829 

.0299 

5.7036 

.00701 

6 

555.785 

.21977 

.03006 

5.7027 

.00712 

7 

555.842 

-.18692 

.0297 

5.7143 

.007429 

The  iteration  1-4  were  based  on  a  saaple  alee  of  250  while  iterations  5-7 
had  a  saaple  size  of  1000.  Iteration  7  was  aade  since  ■  555.829  could  not 
be  used  to  iapleaMnt  Proposition  2  since  V(X)  was  within  5  standard  deviations 
of  0  and  therefore  we  are  not  sufficiently  confident  that  555.829  >  X*. 

We  set  X^  >  555.785  and  X^  ■  555.642.  For  Proposition  2  we  set 
V(Xj)  -  -.18692  -  5(.0297)  -  -.3354,  T(Xq)  -  5.7027  -  5(. 00712)  -  5.6671,  and 
T(X^)  ■  5.7143  -  5(. 00743)  •  5.6772.  Ve  aake  an  additional  siaulation  of  saaple 
site  400  to  calculate  T(X^)  -  TfX^).  The  result  is  T(X^)  -  *  aesn 

of  .00447  and  a  standard  error  of  .002.  Thus  with  a  very  high  probability 


(.3354)  (.01447) 
(5.6671)  (5.743) 


.00745  . 


Since  F^  (x°)  ~  555  x  50  the  percentage  of  error  is,  with  very  high  probability, 
1 
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